[WIP] 554 Remove skimage dependency in get_largest_connect_component #579

YuanTingHsieh · 2020-06-16T23:56:18Z

Fixes #554 .

Description

Remove skimage measure function and implement our own.

Status

Work in progress

Types of changes

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or new feature that would cause existing functionality to change)
New tests added to cover the changes
Docstrings/Documentation updated

…d-component-mask

YuanTingHsieh · 2020-06-16T23:56:46Z

/black

Nic-Ma · 2020-06-17T10:59:34Z

Hi @YuanTingHsieh ,

Thanks for your quick PR.
As we discussed offline, you are right, the traditional method to find connected component is DFS or BFS as your PR, and the skimage lib uses UnionFind method from papers: https://sdm.lbl.gov/~kewu/ps/LBNL-56864.pdf and https://sdm.lbl.gov/~kewu/ps/paa-final.pdf.
I did some analysis to profile the time consuming of your PR:

Most of the time is in the red part, which is skipped in the UnionFind method.
So I think maybe we need to change to UnionFind method for better perf, and I found several open source python practices for your reference:
https://github.com/mahdiyousofun/connected-components-lables
https://github.com/danielenricocahall/Connected-Components-Labeling

Do you think it's doable to quickly evaluate it?
If too much effort needed, let's discuss it.

Thanks.

Nic-Ma · 2020-06-22T11:07:32Z

Hi @YuanTingHsieh ,

I tried to adjust the program in https://github.com/mahdiyousofun/connected-components-lables and run locally. The perf is still much faster than BFS but still slower than skimage:
Tested 3 times on 512 x 512 binary image:

skimage:  0.00779271125793457s
UnionFind:  0.4875638484954834s
BFS:  7.813398122787476s

skimage:  0.007299900054931641s
UnionFind:  0.43839311599731445s
BFS:  7.3031721115112305s

skimage:  0.006865262985229492s
UnionFind:  0.4505500793457031s
BFS:  16.625162839889526s

I think maybe if we want to improve perf, we need to implement in C++, just like skimage.
So let's keep this ticket open and remove from next release.

here is my test program:

import math, random
from itertools import product
from .ufarray import *
import numpy as np


class UFarray:
    def __init__(self):
        # Array which holds label -> set equivalences
        self.P = []

        # Name of the next label, when one is created
        self.label = 0

    def makeLabel(self):
        r = self.label
        self.label += 1
        self.P.append(r)
        return r
    
    # Makes all nodes "in the path of node i" point to root
    def setRoot(self, i, root):
        while self.P[i] < i:
            j = self.P[i]
            self.P[i] = root
            i = j
        self.P[i] = root

    # Finds the root node of the tree containing node i
    def findRoot(self, i):
        while self.P[i] < i:
            i = self.P[i]
        return i
    
    # Finds the root of the tree containing node i
    # Simultaneously compresses the tree
    def find(self, i):
        root = self.findRoot(i)
        self.setRoot(i, root)
        return root
    
    # Joins the two trees containing nodes i and j
    # Modified to be less agressive about compressing paths
    # because performance was suffering some from over-compression
    def union(self, i, j):
        if i != j:
            root = self.findRoot(i)
            rootj = self.findRoot(j)
            if root > rootj: root = rootj
            self.setRoot(j, root)
            self.setRoot(i, root)
    
    def flatten(self):
        for i in range(1, len(self.P)):
            self.P[i] = self.P[self.P[i]]
    
    def flattenL(self):
        k = 1
        for i in range(1, len(self.P)):
            if self.P[i] < i:
                self.P[i] = self.P[self.P[i]]
            else:
                self.P[i] = k
                k += 1


def label_connected_regions_2d(data):
    height, width = data.shape
 
    # Union find data structure
    uf = UFarray()
 
    #
    # First pass
    #
 
    # Dictionary of point:label pairs
    labels = {}
 
    for x, y in product(range(height), range(width)):
 
        #
        # Pixel names were chosen as shown:
        #
        #   -------------
        #   | a | b | c |
        #   -------------
        #   | d | e |   |
        #   -------------
        #   |   |   |   |
        #   -------------
        #
        # The current pixel is e
        # a, b, c, and d are its neighbors of interest
        #
        # 255 is white, 0 is black
        # White pixels part of the background, so they are ignored
        # If a pixel lies outside the bounds of the image, it default to white
        #
 
        # If the current pixel is white, it's obviously not a component...
        if data[x, y] == 0:
            pass
 
        # If pixel b is in the image and black:
        #    a, d, and c are its neighbors, so they are all part of the same component
        #    Therefore, there is no reason to check their labels
        #    so simply assign b's label to e
        elif x > 0 and data[x-1, y] == 1:
            labels[x, y] = labels[x-1, y]
 
        # If pixel c is in the image and black:
        #    b is its neighbor, but a and d are not
        #    Therefore, we must check a and d's labels
        elif y+1 < width and x > 0 and data[x-1, y+1] == 1:
 
            c = labels[x-1, y+1]
            labels[x, y] = c
 
            # If pixel a is in the image and black:
            #    Then a and c are connected through e
            #    Therefore, we must union their sets
            if y > 0 and data[x-1, y-1] == 1:
                a = labels[x-1, y-1]
                uf.union(c, a)
 
            # If pixel d is in the image and black:
            #    Then d and c are connected through e
            #    Therefore we must union their sets
            elif y > 0 and data[x, y-1] == 1:
                d = labels[x, y-1]
                uf.union(c, d)
 
        # If pixel a is in the image and black:
        #    We already know b and c are white
        #    d is a's neighbor, so they already have the same label
        #    So simply assign a's label to e
        elif x > 0 and y > 0 and data[x-1, y-1] == 1:
            labels[x, y] = labels[x-1, y-1]
 
        # If pixel d is in the image and black
        #    We already know a, b, and c are white
        #    so simpy assign d's label to e
        elif y > 0 and data[x, y-1] == 1:
            labels[x, y] = labels[x, y-1]
 
        # All the neighboring pixels are white,
        # Therefore the current pixel is a new component
        else: 
            labels[x, y] = uf.makeLabel()
 
    #
    # Second pass
    #
 
    uf.flatten()

    output = np.zeros_like(data)
    color = {}
    counter = 0
    for (x, y) in labels:
 
        # Name of the component the current point belongs to
        component = uf.find(labels[x, y])

        # Update the labels with correct information
        labels[x, y] = component
        if component not in color:
            counter += 1
            color[component] = counter 

        # Colorize the image
        output[x, y] = color[component]

    return output

Thanks.

YuanTingHsieh · 2020-06-22T20:29:44Z

Hi Nic,

Thanks for trying 2D.
In that case is it really necessary to implement the same algorithm ourselves in C++?
If I am going to implement this in C++ I will still follow the logic in skimage.
And note that the implementation of skimage is restricted to 1, 2 or 3D.
https://github.com/scikit-image/scikit-image/blob/v0.17.2/skimage/measure/_ccomp.pyx#L90

So maybe we should add something in the docstring.

Thanks

Nic-Ma · 2020-06-22T22:31:53Z

Hi @YuanTingHsieh ,

Yes, I think it's necessary to implement it in C++ to improve the perf. But let's hold on this task for a while because we are still discussing how to add the C++ or CUDA code in MONAI(need to make the code) and the CI for them.

Thanks.

YuanTingHsieh added 2 commits June 16, 2020 16:54

Remove skimage dependency

36d97cd

Merge branch 'master' into 554-remove-skimage-in-get-largest-connecte…

4885d4f

…d-component-mask

YuanTingHsieh changed the title ~~[WIP] Remove skimage dependency in get_largest_connect_component~~ [WIP] 554 Remove skimage dependency in get_largest_connect_component Jun 16, 2020

YuanTingHsieh closed this Nov 19, 2020

YuanTingHsieh deleted the 554-remove-skimage-in-get-largest-connected-component-mask branch November 19, 2020 22:57

YuanTingHsieh restored the 554-remove-skimage-in-get-largest-connected-component-mask branch November 19, 2020 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] 554 Remove skimage dependency in get_largest_connect_component #579

[WIP] 554 Remove skimage dependency in get_largest_connect_component #579

YuanTingHsieh commented Jun 16, 2020

YuanTingHsieh commented Jun 16, 2020

Nic-Ma commented Jun 17, 2020

Nic-Ma commented Jun 22, 2020

YuanTingHsieh commented Jun 22, 2020

Nic-Ma commented Jun 22, 2020

[WIP] 554 Remove skimage dependency in get_largest_connect_component #579

[WIP] 554 Remove skimage dependency in get_largest_connect_component #579

Conversation

YuanTingHsieh commented Jun 16, 2020

Description

Status

Types of changes

YuanTingHsieh commented Jun 16, 2020

Nic-Ma commented Jun 17, 2020

Nic-Ma commented Jun 22, 2020

YuanTingHsieh commented Jun 22, 2020

Nic-Ma commented Jun 22, 2020